-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
add array element mutex offset in print and gc #58997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Does this also need 1.11 backport? |
julia> struct CxRL
condvar::Condition
x::Int
lock::ReentrantLock
CxRL() = new(Condition(), 5, ReentrantLock())
end
julia> struct OnceHolder
once::OncePerThread{CxRL}
OnceHolder() = new(OncePerThread{CxRL}(CxRL))
end
julia> holder = OnceHolder()
OnceHolder(OncePerThread{CxRL, Type{CxRL}}(CxRL[], UInt8[], CxRL))
julia> if length(ARGS) > 0
display(holder)
end
julia> holder.once()
CxRL(Condition(), 5, ReentrantLock())
julia> holder.once.xs[2] This still hangs |
Alright, now includes the additional revert required to fix that bug too. |
The only issue with reverting that is a performance regression due to llvm's cost modeling pessimizing vectorization if the GEP calculation happens outside the GEP (it's their fault and maybe they will fix this in the future but this makes a bunch of operations regress) #57389 |
According to #59030 (comment) this doesn't fix the issue |
@@ -4820,9 +4820,10 @@ static jl_cgval_t emit_memoryref(jl_codectx_t &ctx, const jl_cgval_t &ref, jl_cg | |||
setName(ctx.emission_context, ovflw, "memoryref_ovflw"); | |||
} | |||
#endif | |||
Type *elty = isboxed ? ctx.types().T_prjlvalue : julia_type_to_llvm(ctx, jl_tparam1(ref.typ)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to implement this? I guess julia_type_to_llvm
doesn't make the lock FCA? As I said this will pessimize vectorization a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the fact that this pessimizes vectorization is just that LLVM models complex GEPS differently from muls/shifts (even though the backend will probably generate similar code (it shouldn't be hard to pattern match)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a potential problem for the vectorizer, can't we just use the old GEP code if the memory is not atomic?
Performance with locks is abysmal anyways; the remaining special case where the GEP code is invalid is stuff like NTuple{3,Int8}
that must be represented as 4 bytes for AtomicMemory.
In view of current experience with llvm refusing to optimize atomics, I view it as unlikely that the autovectorizer dares to touch them, and we can revisit that in the future if anybody complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds like a good idea for now
src/rtutils.c
Outdated
@@ -1290,10 +1290,20 @@ static size_t jl_static_show_x_(JL_STREAM *out, jl_value_t *v, jl_datatype_t *vt | |||
for (size_t j = 0; j < tlen; j++) { | |||
if (layout->flags.arrayelem_isboxed) { | |||
jl_value_t **ptr = ((jl_value_t**)m->ptr) + j; | |||
if (layout->flags.arrayelem_islocked) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait a second, boxed elements should never be locked, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think claude might have been too eager to use its new bit everywhere
The layout and printing logic need to correctly offset and align the inset fields to account for the per-element mutex of an atomic array with large elements.
Fix #58993